---
sidebar_class_name: node-only
---

# Llama CPP

:::tip Compatibility
Only available on Node.js.
:::

This module is based on the [node-llama-cpp](https://github.com/withcatai/node-llama-cpp) Node.js bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp), allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!

## Setup

You'll need to install the [node-llama-cpp](https://github.com/withcatai/node-llama-cpp) module to communicate with your local model.

```bash npm2yarn
npm install -S node-llama-cpp
```

import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

```bash npm2yarn
npm install @langchain/community
```

You will also need a local Llama 2 model (or a model supported by [node-llama-cpp](https://github.com/withcatai/node-llama-cpp)). You will need to pass the path to this model to the LlamaCpp module as a part of the parameters (see example).

Out-of-the-box `node-llama-cpp` is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at [node-llama-cpp](https://withcatai.github.io/node-llama-cpp/).

A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable `LLAMA_PATH`.

## Guide to installing Llama2

Getting a local Llama2 model running on your machine is a pre-req so this is a quick guide to getting and building Llama 7B (the smallest) and then quantizing it so that it will run comfortably on a laptop. To do this you will need `python3` on your machine (3.11 is recommended), also `gcc` and `make` so that `llama.cpp` can be built.

### Getting the Llama2 models

To get a copy of Llama2 you need to visit [Meta AI](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and request access to their models. Once Meta AI grant you access, you will receive an email containing a unique URL to access the files, this will be needed in the next steps.
Now create a directory to work in, for example:

```
mkdir llama2
cd llama2
```

Now we need to get the Meta AI `llama` repo in place so we can download the model.

```
git clone https://github.com/facebookresearch/llama.git
```

Once we have this in place we can change into this directory and run the downloader script to get the model we will be working with. Note: From here on its assumed that the model in use is `llama-2–7b`, if you select a different model don't forget to change the references to the model accordingly.

```
cd llama
/bin/bash ./download.sh
```

This script will ask you for the URL that Meta AI sent to you (see above), you will also select the model to download, in this case we used `llama-2–7b`. Once this step has completed successfully (this can take some time, the `llama-2–7b` model is around 13.5Gb) there should be a new `llama-2–7b` directory containing the model and other files.

### Converting and quantizing the model

In this step we need to use `llama.cpp` so we need to download that repo.

```
cd ..
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
```

Now we need to build the `llama.cpp` tools and set up our `python` environment. In these steps it's assumed that your install of python can be run using `python3` and that the virtual environment can be called `llama2`, adjust accordingly for your own situation.

```
make
python3 -m venv llama2
source llama2/bin/activate
```

After activating your llama2 environment you should see `(llama2)` prefixing your command prompt to let you know this is the active environment. Note: if you need to come back to build another model or re-quantize the model don't forget to activate the environment again also if you update `llama.cpp` you will need to rebuild the tools and possibly install new or updated dependencies! Now that we have an active python environment, we need to install the python dependencies.

```
python3 -m pip install -r requirements.txt
```

Having done this, we can start converting and quantizing the Llama2 model ready for use locally via `llama.cpp`.
First, we need to convert the model, prior to the conversion let's create a directory to store it in.

```
mkdir models/7B
python3 convert.py --outfile models/7B/gguf-llama2-f16.bin --outtype f16 ../../llama2/llama/llama-2-7b --vocab-dir ../../llama2/llama/llama-2-7b
```

This should create a converted model called `gguf-llama2-f16.bin` in the directory we just created. Note that this is just a converted model so it is also around 13.5Gb in size, in the next step we will quantize it down to around 4Gb.

```
./quantize ./models/7B/gguf-llama2-f16.bin ./models/7B/gguf-llama2-q4_0.bin q4_0
```

Running this should result in a new model being created in the `models\7B` directory, this one called `gguf-llama2-q4_0.bin`, this is the model we can use with langchain. You can validate this model is working by testing it using the `llama.cpp` tools.

```
./main -m ./models/7B/gguf-llama2-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
```

Running this command fires up the model for a chat session. BTW if you are running out of disk space this small model is the only one we need, so you can backup and/or delete the original and converted 13.5Gb models.

## Usage

import CodeBlock from "@theme/CodeBlock";
import LlamaCppExample from "@examples/models/llm/llama_cpp.ts";

<CodeBlock language="typescript">{LlamaCppExample}</CodeBlock>

## Streaming

import LlamaCppStreamExample from "@examples/models/llm/llama_cpp_stream.ts";

<CodeBlock language="typescript">{LlamaCppStreamExample}</CodeBlock>;
